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Abstract 

The regulation of the cell state is a complex process involving several 
components. These complex dynamics can be modeled using Boolean net- 
works, allowing us to explain the existence of different cell states and the 
transition between them. Boolean models have been introduced both as spe- 
cific examples and as ensemble or distribution network models. However, 
current ensemble Boolean network models do not make a systematic distinc- 
tion between different cell components such as epigenetic factors, gene and 
transcription factors. Consequently, we still do not understand their relative 
contributions in controlling the cell fate. In this work we introduce and study 
higher order Boolean networks, which feature an explicit distinction between 
the different cell components and the types of interactions between them. We 
show that the stability of the cell state dynamics can be determined solving 
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the eigenvalue problem of a matrix representing the regulatory interactions 
and their strengths. The qualitative analysis of this problem indicates that, 
in addition to the classification into stable and chaotic regimes, the cell state 
can be simple or complex depending on whether it can be deduced from the 
independent study of its components or not. Finally, we illustrate how the 
model can be expanded considering higher levels and higher order dynamics. 
Keywords: cellular automata, complex systems, epigenetics, transcription, 
protein modifications 



1. Introduction 

Regulation of gene expression is a complex process involving several com- 
ponents of different type, such as epigenetic factors, gene and transcription 
factors. Modeling such a complex system requires us to find the balance 
between the accuracy of the model predictions and our ability to interpret 
the model. On one side of the model spectrum, we have detailed chemical 
kinetics or Boolean network models 



Jong 



(120021 ) . In these approaches the cell 



component heterogeneity is build in when specifying the regulatory interac- 
tions, functions (kinetic models or Boolean functions), and associated model 
parameters. Provided we determine all the regulatory interactions, functions 
and parameters correctly, these models can allow us to make accurate pre- 
dictions of the cell state dynamics. However, detailed models can be queried 
only by means of numerical simulations, making it difficult to uncover or 
understand any behavior that is not known in advance. On the other end of 
the model spectrum we have ensemble models, which specify the statistical 
distributions of the regulatory interactions, functions, and associated model 
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parameters. While these models cannot provide precise predictions about 
specific cell processes, they can allow us to understand what is the typical 
behavior and how it can change under variation of the model parameters. 



Within this class of models, ensemble Boolean networks 



rave been studied 



the most 


Kauffmanl 


(1969) 


Aldana and Cluzel 




>003); 



Derrida and Pomeau 



Kauffman et al. 



fl2004f l. 



(1986) 



Kauffmanl (119931 ): 



The analysis of ensemble Boolean networks has signific antly contr i butec 
to ou r qualitative understanding of the cell state dynamics iKauffmanl ( 119691 . 



19931 ). Different cell states can be associated with differe nt stable attractors 



of the Boolean network dynamics 
the br e akdown of this stabil i ty fof 



Kauffmanl (11968 . 



19931 ) and we can study 



( Il986l ): lAldana and Cluzell ( l2003h : 



o wing parameter changes 



Derrida and Pomeau 



Kauffman et al. 



(120041 ) . More recently it 



is becoming clear that not all transcription factors regulating a given gene 
are equivalent. This is being modeled using Boolean fun ctions with a biolog - 



ically meaningfu l struc ture, such as canalyzing functions 



Kauffman et al. 



Harris et al. 



(2002); 



Kauffman et al 



(|200J) 



(120041 ) and nested canalyzing functions 
However, at the system level, the current ensemble Boolean network models 
typically comprise all elements they consider into one class of objects, within 
which the interactions are determined. This makes it difficult to model the 
general behavior and influence of different groups of elements (cell compo- 
nents) and the different types of interactions which systematically occur be- 
tween elements of these components. 

We introduce a more general class of Boolean networks with an explicit 
distinction between epigenetic factors, genes and transcription factors and 
the types of interactions among them. We call this class of Boolean networks 
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higher order Boolean networks (HOBN), in the sense that we specify wiring 
diagrams both within the three groups and between them, determining type- 
level interactions. We use HOBN to investigate the relative contribution of 
the different cell components to the cell state dynamics. 



2. Higher order Boolean network model 

We model the interaction of three different types of cellular components 
determining the cell state (Fig. [[]): a set of epigenetic factors £, a set of 
genes or transcripts Q, and finally a set of transcription factors or proteins 
V. 

Epigenetic factors form the most basic elements of our system, represent- 



i ng ch emical modifications of the DNA and histone tails 



Jaenisch and Bird 



( 120031 ). The list of all such factors for a cell genome is the set £. They are 
distributed along the DNA forming a linear graph. Within this topological 
graph they have neighbors influencing their states. Each epigenetic factor 
e G £ thus comes with a neighborhood £ e of other elements in £, describing 
a linear order. The elements in this neighborhood influence the epigenetic 
state of the system. Epigenetics can be influenced also by single and com- 
posite gene products which can alter for example methylation patterns. We 
model this by assigning to each factor e G £ not only its direct neighbors 
within £, but also elements in the set of genes Q and in the set of transcrip- 
tion factors V. Thus an element e G £ has three neighborhoods £ e) Q e and 
V e which regulate its state. The epigenetic factors are assumed to be in two 
possible states (e.g. not methylated) and 1 (e.g. methylated). We model 
the control of the epigenetic state by a distribution of Boolean functions fe 
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on the set S which each take as input the states of all three neighborhood 
elements of a given epigenetic factor, and give as output the updated state 
or 1. We think of epigenetics as the most elementary units of the system since 
these factors act upon genes (and consequently on transcription factors) in a 
dominant fashion: an adequate epigenetic state is a prerequisite of all other 
transcription and translation activities, including up- or down-regulation of 
transcription by various factors. 

A gene or transcript g G Q represents any genomic region that can be 
transcribed, including mRNAs of genes, miRNAs and short RNAs. The ac- 
tive state indicates the gene presence. The state of a gene is regulated by 
epigenetic factors, other genes and transcription factors, however in different 
ways. Epigenetic factors determine the secondary DNA structure in their 
neighborhood and the accessibility of this region to transcription factors and 
the transcription machinery. Thus the aggregate epigenetic state of these 
factors on and nearby the DNA segment encoding for a gene can influence 
the gene's transcription rate, and thus the gene state. We model this by 
introducing two different transcription regimes characterized by two Boolean 
functions fg and fg, where fg corresponds to the silent or restricted regime 
and fg corresponds to the active or accessible regime. To model the epi- 
genetic regulation of the transcription regime of a gene we define the set of 
all possible transcription regimes 1Z (here, 1Z = {fg , fg}), together with an 
additional Boolean function f n which controls the change of transcription 
regimes. This function fn takes as input the state of the epigenetic factors 
associated to a gene £ g and the value of its present regime (fg or fg) and 
determines as an output the value of the regime for the next step. Once 
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a regime is determined depending on the epigenetic state of the gene, the 
regime function will update the gene state. This update now happens de- 
pending on gene-gene interactions and transcription factor activities. Thus 
the inputs of the Boolean functions in 1Z are taken from the set of genes and 
transcription factors which regulate transcription of a gene. This means that 
each gene comes also with a neighborhood of other genes and a neighbor- 
hood of transcription factors regulating its state. The gene neighborhood of 
a given gene g will be called Q g and we denote by V g its transcription factor 
neighborhood. The state of g is now regulated by the states of elements in 
Q g and V g through a transcription regime, fg or fg, the choice of which is 
decided by the epigenetic state £ g of the gene. The model thus allows us to 
control epigenetic effects separately from the regulation by other transcripts 
and transcription factors. 

The last group of cellular agents consists of the set of transcription factors 
V, representing proteins and protein complexes. Each element p in V is 
composed of products of a subset of genes Q p . In order for a transcription 
factor p to be assembled, all the transcripts in Q p need to be transcribed 
and translated. This procedure - which enables or disables the activity of 
transcription factors - is again best modeled by a regime switch. We assume 
the transcription factors can be in two different regimes characterized by 
the Boolean functions f^ and /p, where fp = corresponds to the not 
assembled complex and f£ to the regime of the assembled complex. The 
set of all regimes for protein complexes is denoted by C. The choice of 
regime for a protein p will depend on the states of all elements in Q, p via 
a Boolean function f c . This regime switch function f c is simply a logical 
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Figure 1: Higher order representation of the cell components and their regulatory inter- 
actions. The square boxes indicate the sets of Boolean functions controlling the states 
of the elements, the arrows entering a box indicate the input of these functions. In the 
case of gene expression, the choice of function is regulated by epigenetic state; in the case 
of transcription factors, the choice of function is regulated by expression of components 
(AND-function). 



AND relation, since the transcription factor works under /— if and only if 
all its components are transcribed, i.e. if and only if all inputs of fc are in 1- 
state. Here the 1-state of fc stands for f£ while stands for f^. Within the 
positive regime f£ the state of an element in V can depend on interaction 
with various other elements in V itself; for example via post-translational 



modifications 



Walsh et al. 



( 120051 ). These form the neighborhood V v of p G V. 
The functions in the regime /p thus take as input the states of elements in 
this neighborhood. 



The cellular components and regulatory interactions just described are 
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summarized in the wiring diagram shown in Fig. [TJ This diagram empha- 
sizes the level- wise organization of the different types of cellular components. 
From top to bottom we have regime regulation (dashed lines). From bottom 
to top we have state regulation, as considered by previous Boolean network 
models (solid lines). When both types of interactions are put together we ob- 
tain a closed system, a sys tem with feedback, which i s a distinctive property 



of cell regulatory networks 



Barabasi and Oftvail (120041 ) . Following the central 



dogma, we will often refer to epigenetics, i.e. the set £, its elements, neigh- 
borhoods and Boolean rule, as the 0-level of the system. The set Q together 
with the two Boolean rules, neighborhoods, and associated epigenetic factors 
constitute the 1-level, while finally the set of transcription factors V together 
with its own functions, neighborhoods and associated genes will be called 
the 2-level. We will also sometimes refer to the interactions regulating state 
changes as primary interactions, while those regulating a choice of regimes 
will be called secondary interactions. This is also in referen ce to the existing 



notion of higher order cellular automata, as introduced by 
(j2005n . 



Baas and Hclvik 



3. Cell state dynamics 

Previous studies of Boolean network models indicate the existence of two 
dynamical modes. An ordered mode where two different trajectories in the 
cell state space will converge to the sa me trajectory, and a c h aotic mode where 
the trajectories will instead diverge iDerrida and Pomeaul (119861 ) . Later on 
it was shown that the ordered mode implies a nearly stati c system behavio r 
where most elements (stable core) are not changing state iFlyvbjergi ( 119881 ). 
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Here, we follow the latter approach. The total state of our system is expressed 
by five state variables: the epigenetic state e(t) for e G S, the gene state g{t) 
for g G Q, the transcription factor state p(t) for p G V, and furthermore the 
transcription regime state r(t) and the protein regime state c(t), where in the 
latter two cases the — and + states are represented by and 1 respectively. 
The basic set-up for the system dynamics thus consists of five equations ex- 
pressing the probability of changes to happen in all of the five state variables. 
We can think of the total probability of change of state of the system as a 
five-dimensional vector q\t) = (qo(t),qi(t),q2(t),Qi(t),Q2(t)) t , where qo(t), 
qi(t), q2(t), Qiif) and Q2(t) are the probability that a given epigenetic, gene, 
transcription factor, transcription regime, and protein regime state respec- 
tively, will change from step t to step t+1. 

In the general case q(t + 1) is a nonlinear function of q(t), which depends 
on the detailed definition of the Boolean model. Nevertheless, in the nearly 
static, ordered mode, where most elements do not change state, this function 
can be linearized in good approximation: in this range the absolute value of 
the total probability for change in the system is very close to 0, i.e. \q\ — > 0, 
which allows us to neglect higher exponent terms. Note that this linear 
approximation of the system dynamics breaks down outside the ordered mode 
and cannot predict any behavior there, other than the fact that the system 
is in an unstable mode. In the near-static range we thus obtain 

q(t+l)=Aq(t) , (1) 

where A is a five by five positive definite matrix. The entries in the matrix 
A reflect the regulatory patterns indicated in Fig. [1] and therefore it is of the 
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form 



an fli2 a i3 



\ 



a 22 a 23 



a 2 4 



.4 







a 3 3 



a 35 



(2) 



a 4 i 















\ a 52 




Each set in Fig. [T] which is acted on or regulated creates one state dimension 
in q. Each arrow indicating regulation is represented by one non-trivial entry 
in the matrix. Notice that the regime control interactions (dashed lines) are 
represented in the off-diagonal blocks, while the state control, or ordinary 
interactions (solid lines) appear in the upper triangular submatrix. 

In the following we focus on the stability of the linear map ([I]). Specifi- 
cally, the map is said to be stable when \q\ — > as t — > oo, and it is called 
unstable otherwise. The stability of a linear map can be deduced from the 
properties of the eigenvalues of the corresponding matrix, in this case A. 
When the largest eigenvalue has absolute value less than one the system is 
in stable or ordered mode and q(t) converges to zero. If however the largest 
eigenvalue becomes larger than one, this indicates that the system is in un- 
stable mode where the linearization ([1]) is not a suitable approximation to 
the actual system anymore. Using the linearization ([TJ), we can thus analyze 
the model within its 'stable range', i.e. in its near static mode. We can 
furthermore determine the conditions on parameter space which distinguish 
between stable and unstable range, by calculating the largest eigenvalue and 
setting it equal to one. Finally, we can analyze the derivative in time direc- 
tion to determine the influence of certain parameters on the growth of the 
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Cycle type A 





o 

Cycle D 



Figure 2: Effective dynamic modes 

largest eigenvalue. 

Since A is derived from a strongly connected graph (see Fig. [TJ), A is 
irreducible. Furthermore the entries of A are non-negative. In this case 
we can apply the F robenius Theorem for non-negative irreducible matrices 
Gantmacherl (120001 ) . This theorem guarantees that A has a real positive 
eigenvalue A such that any other eigenvalue A of A satisfies |A| < A. 

The largest eigenvalue of A can be obtained finding the roots of the 
characteristic polynomial P(X) = det(A7 — A), where / is the identity matrix. 
In our case P(X) is given by the quintic polynomia 



P(A) = X 2 (A £ - X)(A g - X)(A V - A) 

- X(A £ - X)B - X(A V - X)C + D (3) 

where 



A £ = an A g = a 22 A v = a 33 , 
B = CI23O35O52 C = 024041^12 j 

D = 024041013035052 • (4) 

The direct inspection of this equation tells us about the basic dynamic cy- 
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cles of the system (Fig. [2]). There are three minimal self controlling cycles, 
represented by the type A cycles in Fig. [2j The gene self-cycle represents 
the standard type of dynamics in ordinary Boolean network models, where 
genes are regulated by other genes in a regulation network. This is demon- 
strated in more detail in Sectional Similarly, the epigenetics self-cycle models 
how epigenetic factors update their states based on their previous states and 
the states of their neighbors along a DNA segment. The transcription factor 
self-cycle models changes in the transcription factors state due to interactions 
between them and therefore represents regulation at the post-translational 
level. There are two cycles composed of three components, represented by 
the cycles B and C in Fig. |2j B represents a dynamic mode where a change 
in gene state alters the protein regime state and thus the state of transcrip- 
tion factors. This in turn alters genes states, while epigenetic factors and the 
transcription regime remain unvariable. C involves state changes in epige- 
netic factors, transcription regime and genes. Finally, cycle D is composed of 
five elements, representing a dynamic mode where changes in the epigenetic 
states result in changes of transcription regime, leading to changes in the 
genes states, thus altering the transcription factor regime and transcription 
factor states, which then go on and change the epigenetic factor states. 

Although we cannot explicitly solve the quintic polynomial equation de- 
termining the roots of (J3J), we can derive some general results. For example, 
the boundary separating the stable from the unstable regime is given by the 
equation P(l; Ag, Ag,Ap, B,C, D) = 0. We investigate also the relative in- 
fluence of the different cycles calculating the partial derivatives of A with 
respect to the effective coefficients in (plj (see Supp. Material). It can be 
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shown that in the stable range (A < 1), A is most sensitive in the direction 
of D. In other words the easiest way to make a transition to the unstable 
regime is to increase D. This result indicates that the dynamic mode D, 
which makes full use of all regulatory mechanisms, dominates over the other 
modes. Consequently all regulatory mechanisms are coupled together and 
their influence in the cell state dynamics cannot be analyzed independently 
of each other. This type of structural analysis can provide insight into the 
regulation of the system as a whole without even breaking the calculations 
down to the actual "microscopic parameters" characterizing the neighbor- 
hoods, memberships and the Boolean functions. Furthermore, provided we 
have quantitative estimate of the model parameters we can always numeri- 
cally compute the largest eigenvalue of A and determine whether the system 
is or is not in an stable regime. 

4. Neighborhoods, memberships, Boolean functions and updating 
schemes 

The matrix elements of A can be derived from the properties of neighbor- 
hoods, memberships and Boolean functions. In this way we can also inves- 
tigate the influence of "microscopic" parameters on the cell state dynamics. 
For example, let us assume that, given a type of elements and neighborhoods, 
all neighborhoods have the same size (in an ensemble network one will use 
the estimated mean value). In this case there are three neighborhood pa- 
rameters Kq, Kq and K% for the three types of neighborhoods (£ e , Q e , V e ) 
on the 0- level (epigenetics). On the 1-level (genes) there are two of those, 
K\ and Kj, and finally on the 2-level (transcription factors) there is only 



13 



one K\. These parameters also give the input lengths for the Boolean rules 
determining change of state on the three levels. Furthermore the change of 
regime rule fn on the 1-level has input length Mo, the number of "members" 
constituting the epigenetic state of a gene. Finally, the rule fc changing tran- 
scription factor regime on the second level has input length M 1; the number 
of transcription factor members. 

The Boolean functions can be characterized by t he probability that two 
different inputs result in a different output (sensitivity Shmulevich and Kauffn 
( 120041 )) and the probability that the output is 1 for a randomly chosen input. 
We assume the Boolean functions fs , fg, /p, and fn are randomly sampled 
from the function classes F £ , Fg, F v , and F n respectively, while f v =0 and 
fc = AND. These function classes are characterized by their expected sen- 
sitivity S£, Sg, Sj, = 0, sip, Sfi and sc and probability to be in the 1-state ps, 
Pg) Pv = 0) Pvi Pn an d Pc, resp ectively. The expected sens i tivitie s depend 



on the class of Boolean function 



Shmulevich and Kauffmanl (120041 ) . For ex- 



ample, when the functions are sampled from a distribution with a given p we 
have s = 2p(l — p). On the other hand, for fc = AND we obtain pc = p 1 ^ 1 
and sc = Pg 11 ' 1 , where pg = PnPg + i^ — PnjPg is the probability that a given 
input of the AND-rvle is in state 1. Finally, the matrix elements will also 
depend on the specific updating scheme. In the following we will assume a 
synchronous updating scheme, where the state of all elements in the systems 
are updated simultaneously. 

Overall, we have a system with 15 parameters. The coefficients a,ij can 
be derived from these parameters. Most of them can be determined as in 
previous Boolean network models, consisting of the product of the sensitivity 
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and the neighborhood size. This is the case for an = s £ Kq 1 (i = 1,2,3), 
a 2 i = s g K\~ l (i = 1,2), and a 33 = s v K\, where s g = p n Sg + (1 - pn)Sg 
and s-p = pcSy, are the expected sensitivities of the respective Boolean func- 
tions after accounting for regime changes. The coefficients characterizing the 
change in regimes can be calculated in a similar way, but replacing neigh- 
borhood sizes by membership sizes. This is the case for 041 = s-jiMq and 
052 = scM\. Finally, the coefficients representing state changes following 
regime changes are calculated differently. What matters in this case is the 
probability that two Boolean functions from different regimes result in a dif- 
ferent output given the same input. This is the case for a 2 4 = s(fg 4-> fg) = 

Pgi 1 ~ Pg) +Pg( 1 ~ Pg) and a ss = s (fv fv) = Pvi 1 ~ Pv) + Pvi 1 ~ Pv)- 
Taking these results together we obtain 



a n = s £ Kq , a 12 = ssK] , a 13 = s £ Kq , 

a 22 = s g K{ , a 23 = sgKl , a 24 = - Pg) + P g (1 - pg) , 

a 3 3 = s T pg x Kl , a 35 = p% 

041 = s n M , a 52 = M^- 1 (5) 
From the Frobenius Theorem it follows that the larg est eigenvalu e A is a 



monotonic increasing function of the matrix elements eiy iGantmacherl (120001 ) . 
Therefore we can always investigate the contribution of any of the parameters 
listed above by analyzing their influence on the matrix elements a^. For 
fixed sensitivities, the increase of any neighborhood size K\ always result 
in the increase of at least one matrix element, pushing the system towards 
the unstable regime. In contrast, the number of members of a transcription 
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factor Mi influences a 33 and a 52 (J5J), which are both decreasing functions of 
Mi provided Mi > 1. Therefore, the larger the protein complexes are, the 
more stable is the system. 

The Frobenius Theorem also provides bounds for the largest eigenvalue, 
namely by the minimum and ma ximum of th e row sums of the matrix A: 
minj (X/j a ij) — A < maxj(XL <%) iGantmacherl (120001 ) . In the current exam- 
ple we obtain 



^(i^+i^ + i^ 2 ) , 

Sg{K\ + it?) + S (/«7 /+ 

snM , 
{ Mip^ 1 , 



1 

2 
3 
4 
5 



(6) 



These sums put together all the contributions regulating the units states at 
each level. When all of them are smaller (or larger) than 1 we can guarantee 
that A < 1 (A > 1) and the system is in a stable state (or unstable state, 
respectively). However, when some are smaller and other are larger than 1, we 
are forced to compute A to determine the system stability. In other words, the 
system exhibits non-trivial complex behavior, where fundamental properties 
of the system as a whole are not directly coupled with the corresponding 
properties of its subsystems. 

5. Examples 

To illustrate the concepts introduced above we discuss a few examples, 
allowing us to emphasize the flexibility of this modeling framework and the 
importance of including the regulatory structure at the level of components. 
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Figure 3: Reduced versions of the HOBN depicted in Fig. [T]a) Standard Boolean 
network model, b) Boolean network model accounting for the formation of protein com- 
plexes and regulation at the protein level. 

Standard Boolean network: Here we show how we can reduce our model to 
compare directly to the standard Boolean network models considered in the 
literature so far, with the reduced wiring diagram shown in Fig. [3h. In this 
case, there is no regulation at the epigenetic level and at the transcription 
factor level, nor at the transcription regime level. The only dynamic mode is 
the self-cycle Ag in Fig. |2] and the characteristic polynomial (jHJ) is reduced 
to -P(A) = A 4 (A — Ag). In this case, the largest eigenvalue is A = Ag and 
therefore the effective parameter controlling stability is 9 = Ag. In particular, 
taking into account that Ag = 022 dlj), and assuming constant neighborhood 
sizes and a synchronous update we obtain 

e = s g K\. (7) 
This is precisely the result obtained for the classical standard Boolean net- 
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work model iKauffmanl ( 119691 . Il993f l. Thus, our approach contains the stan- 
dard Boolean network model as a special case. 

Regulation at the protein level: In the next example we study regulation 
on the transcription factor level Fig. [3b. As opposed to the first example, 
here we have proteins and protein compounds regulating transcription; this 
includes the assembly rule, i.e. the regime change on transcription factor 
level. The only non-zero matrix elements are those corresponding to arrows 
in Fig. 13b and the only relevant dynamic modes are the cycles Ap and 
B in Fig. |2j The characteristic polynomial ([3]) now reduces to P (A) = 
—A 2 (A 3 — A-pX 2 — B). While finding the roots of this cubic polynomial can be 
cumbersome, finding the stability condition is in this case straigthforward. At 
A = 1 we obtain the stability condition B + A-p = 1. Furthermore, since the 
largest eigenvalue A of A, Ap and B are all continuous increasing functions 
of the associated matrix elements of A, then A < 1 for B + Ap < 1 and A > 1 
when B + Ap > 1. So in this case, the effective control parameter for stability 
is given by 9 = B + Ap. In particular, assuming constant neighborhood and 
membership sizes and synchronous updates, from (j3J) and (jSJ) we obtain 

6 = s Q K\p+M x pf-' + s v K\pf (8) 

Notice that this formula consistently solves the following subtlety: instead 
of speaking of gene-gene regulation networks as in the previous example, one 
could actually distinguish between the gene and its products (proteins), and 
construct a network where single gene products regulate gene transcription. 
This would be a more accurate description of the biological reality, should, 
however, lead to the same results, since de facto we do not change any inter- 
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actions. In our model, this set-up would mean that we set Mi = 1, i.e. all 
protein complexes are single gene products; furthermore K\ = 0, i.e. there 
is no protein-protein regulation (only proteins regulating genes), and lastly 
K\ = Kl, since we have replaced the number of neighbor genes regulating a 
gene by the same number of their proteins. We obtain 9 = sqK\ = sgK\, 
which coincides perfectly with ([7]). Therefore, unless we account for the 
formation of protein complexes or some regulation at the protein level, we 
obtain the same effective control parameter of the standard Boolean network 
model. 

The addition of protein-protein interactions results in the second term in 
the r.h.s. of ([8]). Here - just as for the case of gene regulation - increasing 
the neighborhood size K\ increases 6, pushing the system towards the unsta- 
ble mode. On the other hand, 9 decreases exponentially with increasing the 
protein complex size Mi, making the system more stable. We see from the 
above formulas how the stability conditions change drastically depending on 
primary and secondary interaction parameters. These corrections emphasize 
the importance of considering the right structure in the modeling frame- 
work. Furthermore, although there is an increase in model complexity, we 
can still derive analytical results allowing us to obtain a better qualitative 
understanding of the cell state dynamics. 



6. Beyond three levels 



The system we analyzed is an example of a Higher Order Cellular Au- 



tomata 



Baas and Helvikl (120051 ) . or even more general, a higher order network 



(HON). The first ingredient of a HON is a hierarchical structure, the idea 
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Figure 4: Beyond three levels: General scheme to construct a HON with up to second 
order dynamics. The sets d represent a hierarchy of types, where higher level elements 
correspond to groups of lower level elements (members). Their states arc regulated by 
regimes /, (e.g. Boolean functions) taking as input states of neighbor elements at the 
same of higher levels (black arrows) . The transition between the regimes is controlled by 
second order dynamics fj taking as input the members state at level i — 1. 
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being that groups of agents can act together as an entity. This hierarchy is 
modeled by creating a new agent on the next higher level, as illustrated in 
Fig. HI Thus we have a collection of sets Cq,C\, ■ ■ ■ , A, of agents of differ- 
ent type, one set on each level. The hierarchical structure is expressed by 
assigning to each element on level n + 1 the set of its members on level n. 
Note that this is a fixed structure which will not change over time. The 
second ingredient consists again of neighborhoods. Neighbors of an element 
on level i are other agents who can take influence on the state of the element. 
They can be of the same type as the element itself (that is, level z-agents) 
or they can be higher level agents. So the neighborhood of a level i-agent 
consists of subsets in levels i and higher. The state of an element on level i 
is regulated by the states of its neighbors, which serve as input for a Boolean 
regime. In a higher order network we assign sets of regimes IZi on each level, 
in other words, the system now has regime states, or first order derived sys- 
tem states. The latter name corresponds to the fact that the rules describe 
change of state. The choice of regimes is regulated by second order rules. 
Input for these second order rules can be chosen depending on the context 
of the model. With this type of wiring we create type-level feed-back loops 
containing primary interaction through direct state control (neighbors) and 
secondary interaction through regime control (members). 

The configuration of the regime change rules at a given time step t can 
be thought of as second order derived system state: it describes the second 
order derivative of the state function. Naturally, if we allow choice again for 
these rules, we can extend this to third order. We would then pick another 
rule determining the "change of regime switch" and depending on states of 
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Figure 5: Third order dynamics: Extension of the regulatory structure to include third 
order dynamics. We could also imagine a situation where there are more than one regime 
switch rule associated with the elements at one level, as illustrated at the third level (box 
R' 2 ). The transition between these regime switchers can be controlled by a third order 
dynamics taking as input the state of elements at a lower level. 
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certain subsets of elements (see Fig. Our example concerning the cell 
state stops at the second order regulatory structure, but one can think fur- 
ther. For instance, as the chromatin strand folds within the nucleus, certain 
regions of DNA can bec ome inaccessible to both ep igenetic state modifiers 
and transcription factors iGrewal and Moazedl (120031 ). This process does not 



affect the epigenetic state of the folded regions. However, the procedure does 
cause silencing of the genes within the folded regions. In our model, this 
would mean that the folding process has locally eliminated epigenetic influ- 
ence on transcription. In other words, it has turned off the epigenetic switch 
between silent and accessible regimes. The epigenetic states are still there, 
but whatever they are, the transcription regime is silent. Thus the switching 
mechanism has been exchanged for a constantly silent one by a master-switch 
which depends on the folding structure. 



7. Discussion 

The study of Boolean networks allows us to understand the characteristic 
features of the cell dynamics despite the great complexity of cell regulatory 
networks. A fundamental pre-requisite to achieve this goal is the use of 
ensembles of Boolean networks whose average properties are representative 
of the cell behavior. It is clear that a multi-level system such as the one 
described above can as well be encoded as an ordinary Boolean network. 
However, such a network will be a very rare realization within the entire set 
of Boolean networks with no pre-defined level organization. In other words, 
the average properties derived from the study of ordinary Boolean networks 
are not representative of a cellular system with its natural hierarchical struc- 
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ture. Higher order Boolean networks are therefore a necessary step to obtain 
network ensembles with pre-defined level-wise structure, a distinctive feature 
of cell regulatory networks. 

Our main conclusion from a first qualitative calculation of a higher order 
model is quite striking. We can show that determining the stability of the 
cell state can be a simple or a complex problem depending on the stability 
condition for each level individually fl6]). When the stability condition is 
satisfied for all levels we can guarantee that the system is stable. Similarly, 
when the stability condition is not satisfied for any level we can assure that 
the system is in a unstable state. In these cases we would say that, although 
the system has a second order structure, it is simple, i.e. its state can be 
determined from the analysis of its components independently from each 
other. In contrast, in between the simple dynamical regimes described above, 
there is a third regime where some levels do and others do not satisfy the 
stability condition. In this latter case we cannot deduce the stability of the 
system from the analysis of the stability of each single level. The system 
is complex, i.e. we are forced to consider all levels at once to determine 
its stability. This evidence indicates that the cell can be in four different 
states: Simple stable when all levels satisfy the stability condition. This 
would imply that the probability for change in any of the cells components is 
zero or converges to zero. The system is complex stable or complex unstable 
when there is at least one level that satisfies and another that does not satisfy 
the stability condition, but the system as a whole is stable or unstable. This 
could represent for example somatic cells in multicellular organisms with 
tissue regeneration (e.g., humans), which are epigenetically stable but may 
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exhibit different dynamical behaviors at the gene and protein levels. Finally, 
simple unstable when all levels do not satisfy the stability condition. A 
potential example of this extreme case could be cancer cells, which manifest 
continuous transformations at the epigenetic, gene and protein levels. 

We can further draw first rudimentary conclusions on the factors that 
influence changes between these dynamic modes based on the linear analysis 
of near-stable regimes. Our set-up allows us to weigh the contributions of the 
different cell components against each other and determine their comparative 
influence using the control structure given by the type-level wiring. We show 
that the primary factor in regulating stability is a dynamic mode involving 
all cell regulatory mechanisms (cycle D in Fig. [2]), in particular also epige- 
netics. To our current knowledge there are so far no ensemble models in the 
literature which integrate epigenetic influence into gene expression in a sys- 
tematic fashion which separates the different regulation mechanisms on the 
system level. In our model we can distinguish epigenetic factors from other 
cell components and account for the special role of epigenetic transcription 
regulation in a biologically sensible and accessible way. 

Our approach also allows us to investigate the influence of "microscopic" 
parameters such as neighborhood sizes, membership sizes and Boolean func- 
tion properties. We obtain that the increase on neighborhood size, at any 
level, push the systems towards the unstable regime. In contrast, the increase 
in protein complex sizes makes the system more unstable. This mathemati- 
cal result has important biological implications. It tell us that if, during the 
course of evolution, both the number of regulatory interactions and the pro- 
tein complex sizes are increased, then the cell can remain in a nearly stable 
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regime. 

For the sake of simplicity we have focused our attention on Boolean mod- 
els. This framework can be generalized to the case when there are more 
than two states chosen from some alphabet. Mo re generally the states can 



take different values in algebraic groups or fields 



Jarrah and Laubenbacher 



( 120081 ). We have freedom to choose different updating regimes as well. In the 
linear regime the form of the matrix A is only determined by the topology of 
the wiring diagram, while the updating scheme just affects the actual values 



of the non-zero matrix elements. 



means of e xpressing intricate re- 



Baasl (120061 ) . We have shown here 



Higher order structures are a powerfu 
lations in regulatory networks of all kinds 
that structures of this kind are natural and adequate candidates for mod- 
eling biological processes. Such models are systematically more exact than 
single-level models since they formally represent patterns and types of reg- 
ulations in the correct way and allow us to resolve the relative contribution 
of the different cellular components. They are called to play an even more 
fundamental role when addressing problems at the multicellular level. We 
hope this work motivates further efforts towards the annotation of cell regu- 
latory networks, making an explicit distinction between the different cellular 
components, their level organization, and feedback regulation. 

Appendix: Partial derivatives of A 

To compute the derivatives of A with respect to As, Ag and A-p, we take 
into account that 
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dP 

ddij 



dP dA 

dA das a 



(9) 



where P^ is the characteristic polynomial associated with the minor of A 
after removing line i and column j. Furthermore 



dP 
dD 



-1 + 



dP dA 
dAdD ' 



(10) 



From these equations it follows that 



dP 



da 



■i.i 



dA 
dD ' 



(11) 



Now, since A is irreducible and non-negative, the largest eigenvalue of A is 
larger or equal than the larg e st eigenvalue of any submat rix of A (Frobenius 



Theorem 



Gantmacherl ( 120001 ) ; iDebreu and Hestreinl (119531 ) ) . The latter result 



implies that Pij(A) > 0. To show that P^ < 1 we need to inspect the precise 
form of Pij. For i — j — 1 we obtain 



Pn(A) = A(A 3 - (Ag + A V )A 2 + A g A v A - C . 
Since A > Aq (Frobenius Theorem) we have that 



(12) 



P U (A) < A(A 3 - (A g + A v )AA g + A g A v A - C) 

< A(A Z - A 2 g A-C) 

< A 4 . 



(13) 



For A < 1 we finally obtain that Pn(A) < 1 and therefore 
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dP 

OAf. 



dP dP 

< — 

5an dD 



(14) 



Following a similar analysis we obtain that 



dP dP < dP 
dAg da 2 2 ~ dD 



(15) 



dP 



dA 



v 



OP dP 
< — 

da 33 dD 



(16) 



On the other hand, for the derivatives with respect B and D we obtain 



OA 

dB 
dA 



dA 



-- A(A-A e )„ 

^ = A(A-A V )^- . 
dC v ' dD 



(17) 
(18) 



Now agai n A is larger than th e largest eigenvalue of any submatrix (Frobenius 



Theorem 



Gantmacherl ( 2000 )) and thus it is larger than any matrix element. 



In particular A > an = Ag and A > a 33 = A-p. Under the assumption A < 1, 
these results then imply that < A(A - A E ) < 1 and < A(A - A £ ) < 1, 
and therefore 



dA 

~dB 

dA 

dC 



< 



< 



dA 

dD 
dA 



(19) 
(20) 
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